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CO , Abstract. We obtain new transport-entropy inequalities and, as a by-product, new deviation esti- 

mates for the laws of two kinds of discrete stochastic approximation schemes. The first one refers to 
the law of an Euler like discretization scheme of a diffusion process at a fixed deterministic date and the 
, second one concerns the law of a stochastic approximation algorithm at a given time-step. Our results 

• notably improve and complete those obtained in |FM12] . The key point is to properly quantify the 

' contribution of the diffusion term to the concentration regime. We also derive a general non- asymptotic 

deviation bound for the difference between a function of the trajectory of a continuous Euler scheme 
associated to a diffusion process and its mean. Finally, we obtain non-asymptotic bound for stochas- 
tic approximation with averaging of trajectories, in particular we prove that averaging a stochastic 
approximation algorithm with a slow decreasing step sequence gives rise to optimal concentration rate. 
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■ 1. Introduction 

^ I In this work, wc derive transport-entropy inequalities and, as a consequence, non-asymptotic deviation es- 
• ^ [ timates for the laws at a given time step of two kinds of discrete-time and d-dimensional stochastic evolution 
^\ • scheme of the form 

H • , 

_Cd_. Xn+i=Xn+7n+iH{n,Xn,Un+i), n>Q,Xo = xeIi'^, (1.1) 

where (7n)n>i is a deterministic positive sequence of time steps, the (?7i)igN* are i.i.d. R'^-valued random 
variables defined on some probability space (51, J^, P) with law /i and the function H : N x R"^ x R'^ — > R"^ 
is a measurable function satisfying for all x £ R"^, for all rt G N, H{n,x, .) G C^ilJ-), and /i(c?M)-a.s., H{n, .,u) 
is continuous. Here and below, we will also assume that /.j satisfies a Gaussian concentration property, that is 
there exists /3 > such that for every real- valued 1-Lipschitz function / defined on R'' and for all A > 0: 

E[exp(A/(C/i))] < exp(AE[/(C/i)] -f- ^). (GC(/3)) 
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It is well known that (GC{(3) ) implies the following deviation bound 



P[/(C/i) - E[/(C/i)] >r]< exp(--) Vr > 0, 
Examples of random variables satisfying this property include Gaussians, as well as bounded random vari- 



ables. A characterization of (GC(/3)) is given by Gaussian tail of Ui, that is there exists s > such that 
E[exp(e|?7i p)] < +oo, see e.g. Bolley and Villani }BV05| . The two claims are actually equivalent. 

We are interested in furthering the discussion, initiated in jFM12j . about giving non asymptotic deviation 
bounds for two specific problems related to evolution schemes of the form p.ip . The first one is the deviation 
between a function of an Euler like discretization scheme of a diffusion process at a fixed deterministic date 
and its mean. The second one refers to the deviation between a stochastic approximation algorithm at a 
given time-step and its target. Under some mild assumptions, in particular the assumption that the function 
u I—!- H{n, X, u) is lipschitz uniformly in space and time, it is proved in |FM12j that both recursive schemes share 
the Gaussian concentration property of the innovation. 

In the present work, we point out the contribution of the diffusion term to the concentration rate which to our 
knowledge is new. This covers many situations and gives rise to different regimes ranging from exponential to 
Gaussian. We also derive a general non-asymptotic deviation bound for the difference between a function of the 
trajectory of a continuous Euler scheme associated to a diffusion process and its mean. It turns out that, under 
mild assumptions, the concentration regime is log-normal. Finally, we study non-asymptotic deviation bound 
for stochastic approximation with averaging of trajectories according to the averaging principle of Ruppert & 
Polyak, see e.g. |Rup91| and [PJ92] . 

l.I. Euler like Scheme of a Diffusion Process 

We consider a Brownian diffusion process {Xt)t>o defined on a filtered probability space (J^t)t>o,P), 
satisfying the usual conditions, and solution to the following stochastic differential equation (SDE) 

Xt = x+ I b{s,Xs)ds+ I a{s,Xs)dWs, (SDEb,^) 
Jq Jo 

where (Wt)t>o is a g-dimensional {J-'t)t>o Brownian motion and the coefficients 6, a are assumed to be uniformly 
Lipschitz continuous in space and measurable in time. 

A basic problem in Numerical Probability is to compute quantities like ExifiXT)] for a given Lipschitz 
continuous function / and a fixed deterministic time horizon T using Monte Carlo simulation. For instance, 
it appears in mathematical finance and represents the price of a European option with maturity T when the 



dynamics of the underlying asset is given by {SDEb.a )■ Under suitable assumptions on the function / and the 
coefficients 5, a, namely smoothness or non degeneracy, it can also be related to the Feynman-Kac representation 
of the heat equation associated to the generator of AT. To this end, we first introduce some discretization schemes 
of (SDEi,,a I that can be easily simulated. For a fixed time step A = T/N, N € N*, we set ti := zA, for all 



i G N and define an Euler like scheme by 

X^ = x,yie |0, N - llXf^^^ = Xf^ + b{U,Xf^)A + a{t,,Xf^)A'/^U,+i, (1.2) 

where (?7i)igN* is a sequence of R'^-valued i.i.d. random variables with law ^ satisfying: E[C/i] = Oq, E[?7iC/i*] = 
Iq, where [/j* denotes the transpose of the column vector Ui and Oq,Iq respectively denote the zero vector of 



and the identity matrix of R"^ (g) R'. Wc also assume that /i satisfies (GC{/3) ) for some (3 > 0. The main 
advantage of such a situation is that it includes the case of the standard Euler scheme where Ui Af(0,lq) 

d 



(satisfying (GC(/3) I with 13 — 2) and the case of the Bernoulli law where Ui = (Bi, • • • ,Bq), {Bk)keli,qJ are 
i.i.d random variables with law /i = ^('^-i + which turns out to be one of the only realistic options when 
the dimension is large. 
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The weak error £D{f,^,T,b,a) = Ej,[/(Xt)] — Ex[f{X^)] corresponds to the discretization error when 
replacing the diffusion X by its Euler scheme for the computation of Ex[/(Xt)]. It has been widely 
investigated in the literature. Since the seminal work of jTTQOj . it is known that, under smoothness assumption 
on the coefficients b, a, the standard Euler scheme produces a weak error of order A. In a hypoelliptic setting 
for the coefficients b and a and for a bounded measurable function /, Bally and Talay obtained the expected 
order using Malliavin calculus. For a uniformly elliptic diffusion coefficient aa* and if 6, cr are three times 
continuously diffcrcntiablc, the same order for the weak error is established in [KM02| . The same order A is 
still valid in the situation where the Gaussian increments are replaced by (non necessarily continuous) random 
variables {Ui)i<i<N having the same covariance matrix and odd moments up to order 5 as the law M(0,lq) 
and if 6, cr, / arc smooth enough. Let us finally mention the recent work |AKIIJ12] where the authors study 
the weak trajectorial error using coupling techniques. More precisely, they prove that the Wasscrstein distance 
between the law of a uniformly elliptic and one-dimensional diffusion process and the law of its continuous Euler 
scheme X"^^ with time step A := T/N is smaller than 0(7V-2/3+«), Ve > 0. 

The expansion of £d also allows to improve the convergence rate to of the discretization error using 
Richardson- Romberg extrapolation techniques, see e.g. |TT90] . 

In order to have a global control of the numerical procedure for the computation of ExifiXT)], it remains 
to approximate the expectation Ej;[/(X^)] using a Monte Carlo estimator M^^ x X)a:=i fii-^r'^y) where the 
{{X^'^y)j^^i j^[j arc Af independent copies of the scheme (|1.2p starting at the initial value x at time 0. This 
gives rise to an empirical error defined by SsmpiM, f, A,T,b,cr) = E,x[f{X^)] — x X^^li /((^t '^)"*)- 

Consequently, the global error associated to the computation of Ex[/(X7-)] writes as 



£Giob{M,A) = E4f{XT)]--E4f{X^)]+-E4f{X^)] - — x ^/((X^'^^) 

i=i 

:= EdU, a, T, b, a) + EEmp{M, /, A, T, b, a). 

It is well-known that if f{X^'^) belongs to L'^{P) the central limit theorem provides an asymptotic rate of 
convergence of order M^^^. If f{X^'^) G L'^(P), a non-asymptotic result is given by the Berry-Essen theorem. 
However, in practical implementation, one is interested in obtaining deviation bounds in probability for a fixed 
M and a given threshold r > 0, that is explicitly controlling the quantity P {£Emp{M, A) > r). 

In this context, Malrieu and Talay jMT06| obtained Gaussian deviation bounds in an ergodic framework 
and for a constant diffusion coefficient. Concerning the standard Euler scheme, Menozzi and Lcmaire [LMlOj 
obtained two-sided Gaussian bounds up to a systematic bias under the assumptions that the diffusion coefficient 
is uniformly elliptic, crcr* is Holder-continuous, bounded and that b is bounded. Frikha and Menozzi [FM12] . 
getting rid of the non-degeneracy assumption on cr, recently obtained Gaussian deviation bound under the mild 
smoothness condition that b, a are uniformly Lipschitz-continuous in space (uniformly in time) and that a is 
bounded. The main tool of their analysis is to exploit similar decompositions used in jTT90j for the analysis of 
the weak error. It should be noted that it is the boundedness of cr that gives rise to the Gaussian concentration 
regime for the deviation of the empirical error. 

Using optimal transportation techniques, Blower and Bolley |BB06j obtained Gaussian concentration inequal- 
ities and transportation inequalities for the joint law of the first n positions of a stochastic processes with state 
space some Polish space. However, continuity assumptions in Wasserstcin metric need to be checked which can 
be hard in practice, see conditions (ii) in their Theorems 1.1, 1.2 and 2.1. The authors provide a computable 
sufficient condition which notably requires the smoothness of the transition law, see Proposition 2.2. in |BB06| . 

In the current work, we get rid of the boundedness of cr and we only need the Gaussian concentration property 
of the innovation. We suppose that the coefficients satisfy the following smoothness and domination assumptions 

(HS) The coefficients &, a are uniformly Lipschitz continuous in space uniformly in time. 
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(HDc) There exists a C^{R'^,Rl) function V satisfying 3Cv > 0, |VFp < CyV, t] := 5 sup^g^d ||v2y(x)|| < 
+00 and 3a £ (0, 1], such that for all x £ R'', 

3Cb>0, snp \b{t,x)\'^ <CbV{x), , 3C^ > 0, snj, Tr{a{t,x)) < C^V^-"{x). 

te[o,T] te[o,T] 



where a = aa* . 

The idea behind assumption (HDq) is to parameterize the growth of the diffusion coefficient in order to 
quantify its contribution to the concentration regime. Indeed, under (HS) and (HD^), with a € [1/2, 1], and 



if the innovations satisfy (GC(/3)), for some positive /3, we derive non-asymptotic deviation bounds for the 



statistical error E^j{x, T, f) — E^[/(X^)] ranging from exponential (if a = 1/2) to Gaussian (if a = 1) regimes. 
Therefore, we greatly improve the results obtained in |FM12j . 

Our approach here is different from |FM12] . Indeed, in jFM12j . the key tool consists in writing the deviation 
using the same kind of decompositions that are exploited in jTT90j for the analysis of the discretization error. 
In the current work, we will use the fact that the Euler-like scheme ()1.2p defines an inhomogenous Markov chain 
having Feller transitions Pk, k = 0, - ■ ■ ,N — 1, defined for non negative or bounded Borel function / : R** — > R 
by 



P,(/)(x)=E /(X.'^^J 



E 



f (^x + b{h,x)A + a{tk,x)A'^-'u) 



Q,--- ,N-1. 



For every k, p Cz {0, • 
Borel function / : R"* — > 



, — 1}, k < p, we also define the iterative kernels for a non negative or bounded 



PkADi^) = Pko---o Pp_i(/)(x-) - E fixt) 



X 



A 



For a 1-Lipschitz function / and A > 0, using that the law fi of the innovation satisfies (GC(/3)) for some 
positive /3, we obtain 



Pjv-i(exp(A/))(a;) = ^Icxp (xf (x + b{tN-i,x)A + a{tN-i,x)A^/^U 



< exp i^\PN^iif)ix) + /3—A\a{tN-ux)\^ 

If a is bounded, the Gaussian concentration property will readily follow provided the iterated kernel functions 
Pk,p{f) are uniformly Lipschitz. Under the mild smoothness assumption (HS), this can be easily derived, see 
Proposition 13.21 Otherwise, using (HDq), we obtain 



PAr_i(exp(A/))(x) < exp XPN-i{f){x) + 



9^X^V'-^{x) 



(1.3) 



The last inequality is the first step of our analysis. To investigate the empirical error, the key idea is to exploit 



recursively from (|1.3p that the increments of the scheme (|1.2p satisfy (GC(/3) ) and to adequately quantify the 
contribution of the diffusion term F^~"(x) to the concentration rate. Under (HS) and (HD^), the latter is 



addressed using flow techniques and integrability results on the law of the scheme ()1.2p , see Propositions 13.1 
and 



1.2. Stochastic Approximation Algorithm 

Beyond concentration bounds of the empirical error for Euler-like schemes, we want to look at non asymp- 
totic bounds for stochastic approximation algorithms. Introduced by H. Robbins and S. Monro |RM51j . these 
recursive algorithms aim at finding a zero of a continuous function ft, : M'' — > R'' which is unknown to the 
experimenter but can only be estimated through experiments. Successfully and widely investigated since this 
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seminal work, such procedures are now commonly used in various contexts such as convex optimization since 
minimizing a function amounts to finding a zero of its gradient. 

To be more specific, the aim of such an algorithm is to find a solution 0* to the equation h{6) := E[//(6', U)] = 
0, where H : X -!■ R"^ is a Borcl function and U is a given M'-valucd random variable with law fi. The 
function h is generally not computable, at least at a reasonable cost. Actually, it is assumed that the computation 
of h is costly compared to the computation of H for any couple {6, u) G 'R'^ x US'? and to the simulation of the 
random variable U. 

A stochastic approximation algorithm corresponds to the following simulation-based recursive scheme 



Jn+l 



-fn+lHien, Un+l), U > 0, 9o G (1.4) 



where (J7„)n>i is an i.i.d. M'^-valued sequence of random variables with law fj. defined on a probability space 
{Q,J-,P) and {"fn)n>i is a sequence of non-negative deterministic steps satisfying the usual assumption 

^7n = +oo, and ^ 7^ < +00. (1.5) 

n>l n>l 

When the function h is the gradient of a potential, the recursive procedure (|1.4p is a stochastic gradient 
algorithm. Indeed, replacing H{9n,Un+i) by h{6n) in (|1.4p leads to the usual deterministic descent gradient 
method. When h{9) = M{9) — £, 9 G M., where M is a monotone function, say increasing, we can write 
M{9) = 'E[N{9, U)] where iV : R x M'' — >■ K is a Borel function and £ is a given constant such that the equation 
M{9) = £ has a solution. Setting H ^ N ~ £, the recursive procedure (|1.4p then corresponds to the seminal 
Robbins- Monro algorithm and aims at computing the level of the function M . 

The key idea of stochastic approximation algorithms is to take advantage of an averaging effect along the 
scheme due to the specific form of h{9) := E[H{9, U)]. This allows to avoid the numerical integration of h at 
each step of a classical first-order optimization algorithm. 

In the present paper, we make no attempt to provide a general discussion concerning convergence results of 
stochastic approximation algorithms. We refer readers to [Duf96| . |KY03| for some general results on the 
a.s. convergence of such procedures under the existence of a so-called Lyapunov function, i.e. a continuously 
differentiable function L : E'' — >■ R_|_ such that VL is Lipschitz, |VLp < C(l + L) for some positive constant C 
and 

{VL,h) > 0. 

See also [LP 12] for a convergence theorem under the existence of a pathwise Lyapunov function. For the sake 
of simplicity, in the sequel it is assumed that 9* is the unique solution of the equation h(9) = and that the 
sequence (6'„)„>o defined by (|1.4[) converges a.s. towards 9*. 



We assume that the law fi of the innovation satisfies {GC{/3) ) for some > and that the step sequence 



(7n)n>i satisfies (|1.5p . We also suppose that the following assumptions on the function H are in force: 

(HL) For all u £ R', the function H{., u) is Lipschitz-continuous with a Lipschitz modulus having linear growth 
in the variable u, that is: 

3Ch>0,V.gR^ sup ™4^^|^^<Ch(1 + H). 

(HLS)a {Lyapunov Stability- Domination) There exists a C^(R'^,R!j_) function L satisfying 3Cl > 0, |VLp < 
ClL, f] :— ^sup^gj^d ||V^L(x)|| < +00 such that 



W9 e R^, {h{9),yL{9)) > 0, and 3Ch > 0, e R^, \h{9)\^ < Ci,L{9). 
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and 3a € (0, 1], 



dCa > 0, Vy e R , sup i < CaL^^{0) 

(u,u' )£{-R.i)^ |U - U I 



(HUA) {Uniform Attractivity) The map h : 9 G IV^ i-^ E[H{9, U)] is continuously differentiable in 9 and there 
exists A > s.t. V6I G R'^, e R'', A|CP < {Dh{9)^,0- 

Compared to [FM12j . our assumptions are weaker. Indeed, it is assumed in jFM12j that the map {9,u) G 
R'' X R*? I—)- H{9,u) is uniformly Lipschitz continuous. In our current framework, this latter assumption is 
replaced by (HL) and (HLS)ct. 

The last assumption (HUA), which already appeared in jFM12j . is introduced to derive a sharp estimate of 
the concentration rate in terms of the step sequence. Let us note that such assumption appears in the study 
of the weak convergence rate order for the sequence (0„)„>i as described in |Duf96j or }KY03| . Indeed, it is 
commonly assumed that the matrix Dh{9*) is uniformly attractive that is TZe{Xmin) > where Xmin is the 
eigenvalue with the smallest real part. In our current framework, this local condition on the Jacobian matrix 
of h at the equilibrium is replaced by the uniform assumption (HUA). This allows to derive sharp estimates 
for the concentration rate of the sequence (^n)Ti>i around its target 9* and to provide a sensitivity analysis for 
the bias Sn '■= E[|6'„ — 9*\] with respect to the starting point So- 
Let us note that under (HUA) and the linear growth assumption 



y9 e R^ E 



\H{e,u)\' <c{i + \t 



which is satisfied if (HL) and (HLS)q, with a G [0, 1], hold and if fi satisfies (GC{/3) I for some /3 > 0, the 



function L : 9 i—i' ^\9 — 9*\ is a Lyapunov function for the recursive procedure defined by (|1.4p so that one 
easily deduces that 0„ — >■ 6**, a.s. as n — )- +oo. 

The global error between the stochastic approximation procedure 9n at a given time step n and its target 9* 
can be decomposed as an empirical error and a bias as follows 

\0n-0*\ = \9„-9*\--Eeo[\dn-0*\]+Eg„[\9„-9*\] 

■= £E7nph,n,H,X,a) + Sn (1.6) 

where we introduced the notations £Emp{liri^H^X,a) = |6'„ — 9*\ — E6Iq[|6'„ — 6**1] and J„ := EeQ[|6'„ — 6^*1]. 

The empirical error £Emp{l,n,H,X,a) is the difference between the absolute value of the error at time n 
and its mean whereas the bias Sn corresponds to the mean of the absolute value of the difference between the 
sequence {9n)n>o at time n and its target 9*. Unlike the Euler like scheme, a bias systematically appears since 
we want to derive a deviation bound for the difference between 9n and its target 9* . This term strongly depends 
on the choice of the step sequence (7n)n>i and the initial point 9o, see Proposition |4]4] for a sensitivity analysis. 

As for Eulcr like schemes, our strategy is different from |FM12j . Indeed, we exploit again the fact that 
the stochastic approximation scheme (jl.4p defines an inhomogenous Markov chain having Feller transitions P^. , 
fc = 0, • • • , — 1, defined for non negative or bounded Borel function / : ^ M by 

Pkifm = E [f(9k+i)\9k = 9]=E[f{9- ik+iH{9, U))] , fc = 0, • • • , iV - L 

For every k, p Cz {0, • • • , N — 1}, fc < p, we also define the iterative kernels for a non negative or bounded 
Borcl function / : K'' K as follows 

PkAfM ^PkO---o Pp-i{f){9) = E [f{9p)\9k = 9] . 
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For a 1-Lipschitz function / and for all A > 0, using (HLS)^ and that the law /i of the innovation satisfies 



(GC(/3) ) for some positive /3, we obtain 

Pw_i(cxp(A/))(0) = E[cxp(A/(0-7Ari/(e,C/)))] <cxp(^AP^^_l(/)(0)+/3^C272,Ll-"(0)j (1.7) 

Let us note the similarity between p.3p and (|1.7p . If (HLS)^ holds with a = 1 then the last term appearing 
in the right hand side of the last inequality is uniformly bounded in 6. This latter assumption corresponds to 
the framework developed in |FM12| and leads to a Gaussian concentration bound. 

Otherwise, the problem is more challenging. Under the mild domination assumption (HLS)q,, the key 
idea consists again in exploiting recursively from (jl.7p that the increments of the stochastic approximation 



algorithm (|1.4p satisfy (GC(/3) ) and in properly quantifying the contribution of the diffusion term L^^°'{d) to 
the concentration rate. 

As already noticed in |FM12| , the concentration rate and the bias strongly depends on the choice of the step 
sequence. In particular, if 7„ = ^, with c > then the optimal concentration rate and bias is achieved if c > j^, 
see Theorem 2.2. in jFM12| . Otherwise, they are sub-optimal. This kind of behavior is well-known concerning 
the weak convergence rate for stochastic approximation algorithm. Indeed, if c > 2Re(A ) know that a 
Central Limit Theorem holds for the sequence {9n)n>i (see e.g. |Duf96| ) . Let us note that the condition c > ^ 
as well as c > 2Tie(\ ) difficult to handle and may lead to a blind choice in practical implementation. 

To circumvent such a difficulty, it is fairly well-known that the key idea is to carefully smooth the trajectories 
of a converging stochastic approximation algorithm by averaging according to the Ruppert & Polyak averaging 
principle, see e.g. |Rup91| and jPJ92j . It consists in devising the original stochastic approximation algorithm 
(|1.4p with a slow decreasing step 

7n=(^) , .gQ,i),c,6>0, 
and to simultaneously compute the empirical mean {0n)n>i of the sequence (0„)„>o by setting 

0n = — — — = &n-l (&n-l — ^'n-l) ■ (1-8) 

n n 

We will not enter into the technicalities of the subject but under mild assumptions (see e.g. |Duf96j . p. 169) 
one shows that 

y/n{9n - 9*) 4 JV{0, S*), n +oo, 

where E* is the optimal covariancc matrix. For instance, for d ~ 1, one has S* ~ ^°''^^'^gt-))'2^'*"* ■ Hence, the 
optimal weak rate of convergence ^/n is achieved for free without any condition on the constants c or b. However, 
this result is only asymptotic and so far, to our best knowledge, non-asymptotic estimates for the deviation 
between the empirical mean sequence {9n)n>o at given time step and its target 9*, that is non-asymptotic 
averaging principle were not investigated. 

The sequence {zn)n>o defined by z„ := {9n+i,9n) is J^-adaptcd, i.e. for all n > 0, Zn is J-"„-measurable, 
where '■= a{9Q,Ui;, k < n). Moreover, it defines an inhomogenous Markov chain having Feller transitions 
K)^, fc = 0,-- - ,A^ — 1, defined for non negative or bounded Borel function / : R'* x R'* R by 

Kkif){z) = E[/(zfc+i)|zfe = z] = I][fi9k+2,0k+i)\ i9k+i,9k) = (^1,^2)], 

^ ilT^'' + kT2^'' ~ U)),Z2 - jk+iHiz2, U) 
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For every k,p E {0, ■ ■ ■ , N — 1}, k < p, we define the iterative kernels for a non negative or bounded Borel 
function / : R'^ x R'^ ^ R 

= KkO--- Kp^i{f){z) = E[/(2p)| Zk = z]. 

Hence, for any 1-Lipschitz function and for all A > 0, using again (HLS)q, and that the law fj. of the innovation 
satisfies (GC{(3) ) for some positive /3, one has for all fc G {0, • • • , — 1} 

Xfc(exp(A/))(z) = E [cxp (A/ {zk+i))\ z^ = z] 

XK,(f){z) + (3- (c^^,+,{^ + 1)L^(Z2) 
< cxp (AX,(/)(z) + I3X'CIjI+,L'-'^{z2)) (1.9) 

where we used that for all (zi, Z2) G R'^xR'^, the functions f (^fq^zi + ^^(-22 ~ lk+iH{z2, w)), Z2 — ")k+iH{z2 

are Lipschitz-continuous with Lipschitz modulus equals to C^^k+iij;^ + l)i~^(-22)- 

Here again, (jl.7p and (|1.9p are quite similar and if a = 1 the concentration regime turns out to be Gaussian. 
Otherwise, an analysis along the lines of the methodology developed so far provides the concentration regime 
of the stochastic approximation algorithm with averaging of trajectories. 

1.3. Transport-Entropy inequalities 

As a by-product of our analysis, wc derive transport-entropy inequalities for the law of both stochastic 
approximation schemes. We recall here basic definitions and properties. For a complete overview and recent 
developments in the theory of transport inequalities, the reader may refer to the recent survey [GL10| . We will 
denote by P(R'^) the set of probability measures on R''. 

For p> 1. we consider the set Vp{R!^) of probability measures with finite moment of order p. The Wasserstein 
metric Wp{iJ., v) of order p between two probability measures /.i, v E 7'p(R'*) is defined by 

WP{fi,iy) = M \ I \x-y\Pn{dx,dy) : tt e 7'(R'* x R"*), tto = /i, ni = A 

where ttq and tti are two probability measures standing for the first and second marginals of tt e V^Rf^ x R''). 
For ^ e V{'R.'^), we define the relative entropy w.r.t u e T'(R.'^) as 



if fi <ti v and H{p,, v) = +00 otherwise. We are now in position to define the notion of transport-entropy 
inequality. Here as below, $ : R+ R+ is a convex, increasing function with $(0) = 0. 

Definition 1.1. A probability measure ^ on IV^ satisfies a transport- entropy inequality with function $ if for 
all V e 7'(R''), one has 

For the sake of simplicity, we will write that fi satisfies T$ . 

The following proposition comes from Corollary 3.4. of |GL10) . 

Proposition 1.1. The following propositions are equivalent: 
• The probability measure fi satisfies r$. 
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For all 1-Lipschitz function f , one has 



VA > 0, J cxp(A/)rf^ < cxp (^X J fdn + ^*{\)J , 



where <&* is the monotone conjugate of ^ defined on as $*(A) — siip^^Q {Xp — 

Such transport-entropy inequalities are very attractive especially from a numerical point of view since they 
are related to the concentration of measure phenomenon which allows to establish non-asymptotic deviation 
estimates. The three next results put an emphasis on this point. Suppose that {Xn)n>i is a sequence of 
independent and identically distributed R''-valued random variables with common law /i. 

Corollary 1.1. // fi satisfies T$ then for all 1-Lipschitz function f and for all r > 0, for all M > 1, one has 

E[/(Xi)]| >r) <2cxp(-A/$(r)) 



/ M 

p i^E/(^^) 

\ fc=i 



Proposition 1.2. // /i satisfies 2$ then the empirical measure /i" defined as fj," = -^Y^^k^i satisfies the 
following concentration bound 

P , /^) > £[1^1 (Ai", /i)] + r) < exp (-n$(r)) . 

where for x G R'', 5.^. stands for the Dirac mass at point x. 

The quantity E[Wi{fj,, ii")] will go to zero as n goes to infinity, by convergence of empirical measures, but 
we still need quantitative bounds. The next result is an adaptation of a result of jRR98) on similar bounds but 
for the distance W2. For sake of completeness, we provide a proof in Appendix 14.21 



Proposition 1.3. Assume that /i has a finite moment of order d + 3. Then, one has 
where 



C{d, fi) := 4Vd + 2y y" (1 + \x\'^+^)~^dx^ 2-^^ ^ 2^-^ J \y\'^+^ fJ-idy) + 2^-dd{d + 3)!. 
In view of Kantorovich-Rubinstein duality formula, namely 

Wi in, v) = sup I y" fdp. - j fdv. [f] 



< 1 



where denotes the Lipschitz- modulus of /, the latter result provides the following concentration bounds 
Vr > 0, VM > 1, P f sup f i- V fiXk) ~ E[/(Xi)] ) > C(d, /i)A/-i/('^+2) + r] < exp (-M$(r)) . 

Similar results were first obtained for different concentration regimes by BoUey, Guillin, Villani jBGVOTj 
relying on a non-asymptotic version of Sanov's Theorem. Some of these results have also been derived by 
Boissard |Boill| using concentration inequalities, and were also extended to ergodic Markov chains up to some 
contractivity assumptions in the Wasserstein metric on the transition kernel. 

Some applications are proposed in jBGV07j . Such results can indeed provide non-asymptotic deviation 
bounds for the estimation of the density of the invariant measure of a Markov chain. Let us note that the 
(possibly large) constant C(d, /z) appears as a trade-off to obtain uniform deviations over all Lipschitz functions. 
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As a consequence of the transport-entropy inequahties obtained for the laws at a given time step of Euler 
like schemes and stochastic approximation algorithm, we will derive non-asymptotic deviation bounds in the 
Wasserstcin metric. 



2. Main Results 
2.1. Euler like schemes and diffusions 

Theorem 2.1 (Transport-Entropy inequalities for Euler like schemes). Denote by X^'^'^ the value at time T 



of the scheme \1.2\ associated to the diffusion {SDEi,jj\ starting at point x at time 0. Denote the Lipschitz 



modulus of b and a appearing in the diffusion process {SDEb^a^ by [b]i and [a]i, respectively and by ^^'^'^ 



the law of Xrp' . Assume that the innovations ([/;),;>! in (|1.2p satisfy (GC{(3)] for some /3 > and that the 
coefficients b, a satisfy (HS) and (HD^) for a G [i, 1]. 

Then, fi^''^'^ satisfies 1$. with $^(A) = supp>Q{A/9— ^a{p)} 
with: 

• Ifae (i,l], for all p>0 

= *„(T, A, b, a, x)(p' V 

with *„(T,A,fo,a,x) =i^3.iMr,6,a,A)2 Vv'(r,&,a,A)^), ^(T, fe, a. A) = a/3 (i±ggl^e3C^(^)^ 
C(A) 2[6]i + [crW + l^\b]\ and the constant K^ i being defined in Corollaru \3.1\ 

• Ifa^^,forallpe [0, <f{T, b, a, A)~^/^X3,2) 



where the positive constants A3. 2 and K^,2 are defined in Corollarv \3.SX 

Note that in the above theorem, we do not need any non-degeneracy condition on the diffusion coefficient. 
In the case a € (5,1], one easily gets the following explicit formula: 

• If A e [0,2*], then $;(A) = ^^A^; 

. If A e [^*, +00), then $;(A) = {^f^' A^"; 

. If A e (2*, ^vl/),then $;(A) = A - 

Let us note that the linear behavior of $* on a small interval is due to the fact that is not C^. One may 
want to replace p^ V p^^-i by p^ + (up to a factor 2) in the expression of However, in this case, an 

explicit expression for $* does not exist (except for the case a = 1) and only its asymptotic behavior can be 
derived so that one is led to compute it numerically in practical situations. 

In the case a = 1/2, tedious but simple computations show that 




'^3.2 



A3.295(r,6,a,A)i/2- 



This behavior corresponds to a concentration profile that is Gaussian at short distance, and exponential at large 
distance. 



Corollary 2.1. (N on- asymptotic deviation bounds) Under the same assumptions as Theorem \2.1[ one has: 
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/ M 
\ fe=l 



• for all real-valued 1-Lipschitz function f defined on R**, for all a G [1/2, 1] for all M > 1 and all r > 0, 

C^)'^) - E,[/(X^)]| >r^< 2exp(-A/$:(r)), 

• for all a G [1/2, 1], for all M > 1 and all r > 0, 

l^^ sup^^ |^_L^/((X^)fc)_E.[/(X^)]^ >C(d,A.^'°'^)Af-V(d+2)+^^ <cxp(-il/$;(r)), 

where the ((^■^)'°)i<fc<A/ are M independent copies of the scheme (jl.2p starting at point x at time and 
evaluated at time T . 

Remark 2.1 (Extension to smooth functions of a finite number of time step). The previous transport-inequalities 
and non- asymptotic bounds could be extended to smooth functions of a finite number of time step such as the 
maximum of a scalar Euler like scheme. In that case, it suffices to introduce the additional state variable 
{M^)i>i := (maxfcg jo,i] -''^t^)i>i • Now, the couple {X^ , Mf^)i<:i<:j\[ is Markovian and similar arguments could 
be easily extended to the couple for Lipschitz functions of both variables. 

Remark 2.2 (Transport-Entropy inequalities for the law of a diffusion process). The previous transport- 
inequalities and non- asymptotic bounds could be extended to the law at time T of the diffusion process solution 
to {SDEb^a) by passing to the limit A — > 0. Indeed, it is well-known that under (HS), one has X^ ^ Xt, 



as A — > and by Lebesgue theorem, one deduces from the first result of Corollary \2.1\ that the empirical error 
(empirical mean) of Xt itself satisfies a non- asymptotic deviation bound with a similar deviation function (just 
pass to the limit A — > in all constants) . Then, using Corollary 5.1 in \GL10^ (equivalence between deviation of 
the empirical mean and transport- entropy inequalities), one easily derives that the law of Xt satisfies a similar 
transport- entropy inequalities when a G (1/2,1]. 

We want to point out that it is the growth of a that gives the concentration regime ranging from Gaussian 
concentration bound if a = 1 to exponential when a = ^. However, in many popular models in finance, the 
diffusion coefficient is linear, for instance practitioners often have to deal with Black-Scholes like dynamics of 
the form 

t pt 



Xt^xo+ f b{X,)X,ds + f a{X,)X,dW, 
Jo Jo 



for smooth, bounded coefficients b,a. For the estimation of ExlfiX^)] for a Lipschitz function / : R'' R, 
or even in more general situations, the estimation of Ex[f{X'^)] for a Lipschitz function / : C — > R, where 
C := C([0,T],R'^) stands for the space of R'^-valued continuous functions on [0,T], equipped with the uniform 
norm ||/||oo := supQ<j<y |/(i)|, the expected concentration is the log-normal one. To deal with the latter case, 
we consider the continuous Euler scheme X'^''^ associated to {SDEb^a I and writing 



yt e [o,r], x^^"" ^ + / b{Hs),x;-^%)ds + ^ ai4^{s),xl^(^^)dWs, X e n". (2.1) 

where we set (/)(t) := ti for ti < t < t^+i, i € N. The next result provides a general non-asymptotic deviation 
bound for the empirical error under very mild assumptions. 

Theorem 2.2 (General non-asymptotic deviation bounds). Denote by X'^'^ {Xf''^)o<t<T the path of the 
scheme (j2.ip with step A starting from point x at time 0. Assume that Vt £ [0,r], the coefficients bit, .) and 
a{t, .) are continuous functions in x and that they satisfy the linear growth assumption: 

Vx G R'', sup \b{t,x)\<Cb{l + \x\), sup rr(a(t,x)) < C^(H- |a;|2). 

te[0,T] t6[0,T] 
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Then, for all 1-Lipschitz function / : C — > R, for all M G N*, for all r > 0, one has 



< 



fc=i 



2cxp (^- 4K(b!^.r) log ( (2(i+H))2 ) ot/iermse 



where cr, T) := 28(1 + (Co- V Cb)T) and ((X*^' ) )i<fe<A/ Af independent copies of the scheme |2.ip . 
r/ie result remains valid when one considers the path of the diffusion X solution to (SDEh,^) instead of the 
continuous Euler scheme. 

2.2. Stochastic approximation algorithms 

Theorem 2.3 (Transport-Entropy inequalities for stochastic approximation algorithms). Let N G N* . Assume 
that the function H of the recursive procedure {On)o<n<N (with starting point Bq G H'^) defined by (jl.4[) satisfies 
(HL), (HUA) and (HLS)q, for a G [5, 1], and that the step sequence 7 = (7,i)n>o satisfies (jl.Sp . Suppose that 
the law of the innovation satisfies (GC{f3) ), f3 > 0. Denote by jJ^lf'^" the law of 6m- 
Then, fJ.jf'^" satisfies r$* with $* m(X) = sup^^Q {Xp — ^a.N{p)} o,nd one has: 

• Ifae (i,l], for all p>0 

with the two concentration rates C'Jq := X^fcLo^ it+i ni " ' '^^^^ ^i.N '■= 11^0^(1 ^ 2A7fc+i + 6*^.^47^+1) 

and Cji" 7^^ (fe)^ ((^ + 1) ^og\k + 4))^ for all N > I, where Ch.^ := 2C|(1 + 

E[|J7p]) and (pa{j, H,0o) is an explicit constant defined in Proposition \4-3\ 

• Ifa=^,forallpe[0,X4:.i/sN), 

1 - [PSN/M.l) 

1 

with SN ■■= maxo<fc<Ar_i(fc + 1)^/^ \og{k + 4)7fc+i (jff^'j ' (^^viJ2pJo^ (p+i) iog^(p+4) ) ""'^ (positive) 
constants (^1/2(7? ^7 ^0) ci'i^d, A4.1 are defined in Proposition \4-3\ 

As in the case of Euler like schemes, for a e (i, 1], we have: 

. ifAe[0,2^(C;!,/(Cr)'"-i)^],then<i>;,^(A) = AV(4^C;^); 

• If A e [^^^iCl/{Cjj'^r"-^)^,+oo), then $*^^(A) = ^ (^)'""' {X'V {0^?-'); 

. If A e i2^{Cl/{Cr)"'-')^,^MCl/{CD"'-')^), then <i>;,^(A) = (^)^A - 

2c.-l ■ 



For a = we obtain the following explicit bound for the Legendre transform of $1/2, 



N 



VA>0, *;„,„(A) = ?fgi fi + li§iiV-i 



2 



Hence, for A^ > 1 being fixed, the following simple asymptotic behaviors can be easily derived: 
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• When A is small, 4>I/2,w(A) - \\^X^ / {2ipCl); 

• When A goes to infinity, '^*i/2^^) ^ ^aa^/sn- 

Corollary 2.2. (N on- asymptotic deviation bounds) Under the same assumptions as Theorem \2.3\. one has 

Poo QOn -d*\>r + 6N)< exp {-K,Nir)) 
and 6n ■— Eg^ [\9n — 6*\]. Moreover, the bias Sn at step N satisfies 

CN-i \ i 

E 7,\ie-2A(ri,„-ri,.+i)+2C„„(r.,„-r.,.+i) ^ 
fe=0 / 

where Ti^n := ELi Ik, T2,n := Ek=i ll <^o,m ■= A'/S + 2a/vE[|C/|2] with K > 0. 

Now, we investigate the impact of the step sequence (7„)„>i on the concentration rate sequences Cjf, C]^", 
sn and the bias 5n- Let us note that a similar analysis has been performed in |FM12| . We obtain the following 
results: 

• If we choose 7„ = ^, with c > 0. Then (J^r — > 0, — > +oo, Ti n ~ c\og{N) + c\ + r^, c[ > and 
rjv 0, so that Ui,n = 0{N^'^''^). 

- If c < the series Etillf^i^k. E^JoSST (l/n?f^)((A: + 1) log2(fc + 4))^ converge so 
that we obtain CJ^ = ©(iV'^cA)^ ^ ^^^_^cA^^ ^ ©(iV-^A). 

— If c > a comparison between the series and the integral yields Cj^ = 0{N~^), Cjj" = 

0{{\og{N)f^ N-^), Sn = 0{\og{N)N-i). 

Let us notice that we find the same critical level for the constant c as in the Central Limit Theorem 
for stochastic algorithms. Indeed, if c > 2Tie{x — ~ where Xmin denotes the eigenvalue of Dh{9*) with 
the smallest real part then we know that a Central Limit Theorem holds for (6'„)„>i (see e.g. jDuf96] . 
p. 169). Such behavior was already observed in |FM12| . 
The associated bound for the bias is the following: 



Sn < K 



• If we choose 7„ = c > 0, i < p < 1, then 5n — > 0, Fi.at ~ rr^^^ as — > +oo and elementary 
computations show that there exists C > s.t. for all iV > 1, Hi^n < C exp(— 2Ay^A^^^''). Hence, for 
all e e (0, 1 — p) we have: 

k=l [ fe=l fc=Ar-AfP+^ + l 

< <^ Cexp(-2A (N^-P - (N - NP+'V-p)) + ^ } 

< jcexp(-2AcA^') + 



Up to a modification of e, this yields Cjj ~ ^i,Nj2k=ilk^ik = °(-^ e G (0, 1 — p)- Similar 



computations show that C^'" = o{N " ^c-i ^) and we clearly get sn — O (\og{N)N ^p 2) 
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Concerning the bias, from Corollary 12.21 we directly obtain the following bound: 

Si, < K (^cxp ^-^N^-^j \e, - r I + ^^^1^^ , Ve > 0. 

The impact of the initial difference \0i^ — 9*\ is exponentially smaller compared to the case 7„ = -. This is 
natural since the step sequence is decreasing slower to 0. 

Theorem 2.4 (Transport-Entropy inequalities for stochastic approximation with averaging of trajectories). 
Let N e N*. Assume that the function H of the recursive procedure 9 ~ i6n)o<n<N (with starting point 
9o G IV^) defined by (|1.4p satisfies (HL), (HUA) and (HLS)^ for a G [5,1], and that the step sequence 
7 = {ln)n>i satisfies (|1.5p . Suppose that the law of the innovation satisfies {GC{j3)], /3 > 0. Denote by 
^7,0,6(0 of 9n where 9 is the empirical mean of 9 defined by (jl.Sp . Then, p-jf' " satisfies T^* ^ with 

^*a,NW = supp>(, {Xp - ^a.N{p)} and one has: 

• Ifae (i, 1], for all p>0 

^c..n{p) = vo.h,H,9„){clp^ y c]-"p^) 

where (^0,(7, iJ, 0o) a positive constant defined in Section \4.2\ 

• If O- ^ \, for all p e [0, A4.1/S7V), 

1 - [psNlM.i) 

where ipi/2i'y, H,9q) and A4.1 are positive constants defined in Proposition \4-^ 
where the three concentration rate sequences are defined for N G N* by 



N-l N-1 



_ _ za 1 — a 1 ■r-^iV — 1 

--^ E ^l^' ^n" E "TkA ((^+1) log'(fc+4))^, SN ^^max_^(fc+l)^ log(fc+4)7,,Are^-° (p+d .0.^(^+4) 

fe=l k=l ^ ^ 

with jk^N := ^(1 + T,f=k+ii^)^)' a"'^ ni,Ar := Up=a - ^Xr/p+i + Cn^plp+i). 

As regards the explicit computation of the Legendre transform of ^a.N, similarly to the previous theorem, 
we have: 

• for a e (i, 1]: 

- if A e [0,2^{Cl/{Cjj''r-')^], then $;^^(A) = (XVA^CI); 

- If A e [^^iCj,/{Crr"-')^,+^), then ^l^X) = ^ (1^)'""' (A2«/(C;^-)2-i); 

- If A e {2^{Cl/{Crr''~')^,^MCl/iCrr''-')^), then $;,^(A) = (^)^A- 



• for a = i 



2 



ng fixed, the following simp] 
n/2,NW - Xl,Xy{2ipC],); 



Hence, for A'^ > 1 being fixed, the following simple asymptotic behaviors can be easily derived: 
- When A is smaU, $r n^.,\2 \2, 
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— When A goes to infinity, <i>^^2('^) ~ ^a.i^/sn- 

Corollary 2.3. (N on- asymptotic deviation bounds) Under the same assumptions as Theorem \2.4\ for all N > 1 
for all r > 0, one has 

Peo{\0N '0*\>r + SN) < cxp (-$;,w(r)) 

and Sn ■= [|^Ar - 6** |] • 

Now, we analyze the impact of the step sequence on the concentration rate sequences Cjj, Cjj" , sn and the 
bias Sn- We first simplify the expression of the concentration rate. Let us note that since the step sequence 
(7n)n>i satisfies (jl.5|) . there exists a positive constant K > such that (Hi j-n^j"].) 2 < K exp{—X{Tij — ri^k+i)), 
k < j. Moreover, since the function x 1— >■ exp(— Ax) is decreasing on [Fi^p, Fi^p+i], one clearly gets for all 

z,je{0,--- ,iv-i}, i<j 



2_]exp(-Ari^p+i)7p+i = 2_/ / cxp(-Ari,p+i)(i.T < ^(exp(-Ari,^) - exp(-Arij)) 

p—i p—i ^i-P 



SO that, using the latter bound and an Abel transform, we obtain 

J2 exp(-Ari,,+i) = iMj+i-M,)jJ^,<--i (exp(-Ari,,+i) - exp(-Ari,,))7-+\ 



j=k+l j=k+l — \j=k+l 



N-1 



A . 

p=k+l 

which finally leads to the following bound 




(2.2) 

Now, we are in position to study the impact of the step sequence (7n)n>i on the concentration rate sequences: 
• If we select 7„ = ^ with c > 0, then, using that Fijv = clog(iV) + c'l + r^r, c'l > with rjv — >■ 0, one 
easily derives from (|2.2p that there exists C > such that 

( 1 1 1 1 

\ p=k ^ 



and a comparison between the series and the integral yields the following bounds: 

- If Ac < i, one has: Cj^ = 0{N-'^'=^), Cjj" = OiN-^"^) and sn = ©(iV""^). 

- If Ac> i, one has: C]^ = 0{N-^), Cj/" = 0{{\og{N)f^N-^) and sn = 0{N-^). 
Hence, we clearly see that for the case jn = f[, averaging the trajectories of a stochastic approximation 

algorithm is not the key to circumvent the lake of robustness concerning the choice of the constant c. 

The bound for the bias is obtained by averaging the bound previously obtained for Sn- We easily 
get: 



k=0 \ 
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If we choose 7„ = c > 0, ^ < p < 1 then we have for k < p 



p 



p 



ri,p-r 



j=k+l j=k+l'^l 



i+1 



-dx > 



J' 



^dx >^{{p + ly-p _ (fc + i)i-p) 
fe+i ^ 1 — " 



so that for some positive constant C which may vary from hne to hne 

JV-l 



E ' 

p=k+l 




(p+i)'-". 



fe+i 



A 2p-l 

e i-p^a; i-p da; 



(fc+i)i-p 



where we use a change of variable in the latter integral. For k large enough, the function x i— >■ e ^-p^x'^-p 
is decreasing on [fc, +oo) which implies 



(fc+i)'-'' 



(W-l)!-" A 2 1 

e-T^"=a;T^ ^^dx < C(fc + iff 

(k+iy-p x^-p 



1 



-x 



+00 



< C(fc + l)f 



(k+iy-p 



Hence, we finally have 7^,^, = 0{N-^) so that C]^ = 0{N-^), C]^'^ = 0{{\og{N)f'^N-'^) 

and Sm = 0{log{N)N^2y Hence, averaging has allowed the concentration rate to go from the slow 

I p-(i- 
conccntration rates o{N~''^'^), o{N ^a- 



'-) for all e > and O (^log(7V)iV"(''~^)) to the optimal 



rates 0{N~^), 0{{\og{N)f^^N^^^) and sat = 0(log(A^)A^"5) for free, i.e. without any condition 
on the step sequence parameter c. 

Concerning the bias, by averaging the bias sequence {Sk)i<k<N~i we directly obtain the following 
bound 



Sm < K 



N 



N'i- 



, Ve > 



Hence, we see that there is no sub-exponential decreasing of the impact of the initial condition but 
a decay at rate 0(N^^). Consequently, this leads us to say that a stochastic approximation algorithm 
must be averaged after few iterations in practical implementations and not directly from the first step. 

3. EuLER Scheme: Proof of the Main Results 

In this section we will assume that (HS) and (HD^) are in force. 

3.1. Proof of Theorem [23] 

The proof of Theorem 12. H is divided into several propositions. 



Proposition 3.1. Denote by X 



A,0,: 



{X^^'°'''')o<k<N the scheme with time step A = T/N, N e N* 



associated to the diffusion {SDEb a) starting from x at time 0. Assume that the innovations {Ui)i>i of (|1.2I 
satisfy {GC{I3) \ for some /? > 0. Then, there exists ep > which only depends on the law /i such that for all 
A < min(l, e/3(277Q;CCTTexp(CT))~^), one has 

sup logfE, [cxp(Ay"(xf'°'"))l) < Acxp(CT)y"(x) + J log (E [exp (A2ryaaTexp(CT)|C/ip)]) . 

0<n<N ^ L -1^ ^ 
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With C := C{b,a,V,a,A) = a{CvCb)i + PC^a'^il + 2r]A)'^{Cv + Ct) + aT]CbA. 
Proof. Using the concavity of x i— ?> cc", a £ (0, 1], we have for all fc > 

A Taylor expansion of order 2 of the function V, recalling that 2ri = sup^gjjd ||V^F(x)|| < +oo, yields 
which together with the previous inequality leads to 



VViXf^J.bitk.Xf') ^ .VViX.'^J.aitk.Xftpk 



-I- aA5 



+1 



V'-HXt't) 



v'-^Xt^J 



,b{M^^X^)Mtk^X^)lh+i ^ |a(ife,X/^)C/fe+i|2 



v'-'^ixtj 



From (HDq), for all e R'' x R'^, we clearly have s\yp^^^QJ•^\\JV{x).h{t,x)\ < {CvCb)^^V{x) and 

supjgfo T] \cr{t,x)u\'^ < CaV^~"{x)\u\'^ which yields 



(VF(X,t) + b(X,t)).a(X,t)L/,+ 

v'-Hxt) 



C^ar,A\Uk 



+11 



Using (HDq), Va; G R'' the functions 17(0;, .) : u 1— J- ('^^(^+^(^))^-°'(^)" ^j-g Lipschitz, and more precisely satisfy 



Va; e R , sup 

(«,ti')e(R'' 



m — M 



Hence, from the Cauchy Schwarz inequality and since the law of the innovations satisfy (GC(/3) ) for some 
(3 > 0, there exists > such that for A < min(l, e^(2?7aCCTA)~^), one has 



E 



exp(AF"(X,'^^J) Tt, <cxpiXV''{xf'Jil + aiCvCb)--A + ar,CbA')) 



X E 



E [cxpi2Xr]aCaA\Uk+if)\j^t, 



exp(2AaA5(l + 2r,A)giXf^, Uk+i)) 

< cxp(Ay"(X,'^)(l + a{CvCb)iA + ar^CbA^)) 

X exp(A2/3a2^(l + 2r^A)\Cv + Cb)C,V''{Xf'J) x E [exp(2A?yaC,A|C/i p)] ^ 

< cxp(AC(A)l^"(X,'^))E [exp(2A77aaA|C/in] ^ , 



where C(A) := 1 + A (^a(CvCh)5 + l3C^a^{l + 2j]Af{Cv + Cb) + ariC'bAj . Now define 14 
k e {0, • • • , N}. Taking expectation in both sides of the previous inequality clearly implies 



for 



E[exp(AI4+i)] <E[exp(AFfc)]E 



exp ( A 



2T]aCaA 2 
'C(A)fc+i'^^' 
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and by a straightforward induction, for n e {0, • • • , N} we have 



E[cxp(AK)] <cxp(Ayo) 



which finally yields, for A < niin(l, £^(27706*^ AC(A)")-^), 



n-l 



E [exp(Ay"(X,^))] < exp(AC(A)"F"(Xo)) \{ E [exp {\2i'jaC,^C{^f+^\Ui\^)] ^ . 

k=0 

Observe now that C(A)^ < exp(Cr) with C := C{b, aV, a, A) = a{CvCb)^ +pCaa^{l + 2t]A)^ {Cy + Cb) + 
arjCbA. Using Jensen's inequality, the latter bound clairly provides the following control of the quantity of 
interest for A < min(l, e^(277aC^rexp(Cr))-i) 

sup log (E [cxp(Ay"(X,^))]) < Aexp(CT)y"(Xo) + log (E [exp {X27jaC,T cMCT)\Ui\'')]) . 

0<n<N ^ 



□ 



Corollary 3.1. Under the same assumptions as Proposition \3. 1[ for all a £ {^,1], one has 
VA > 0, sup log (E, [exp(A^i-"(Xf ))]) < R's.ii^ V A^) 

0<n<Af 

where K 3, 1 := max (^'i(r. A, a;, 6, ct), 4'2(T, A, x, 6, cr)) and 

l-a 



*i(T,A,x,&, a) 



exp I p e^^V"{x) + -logE[e^^^^ 



\u\^ 



£^<(l-")irn2 



2a 



\u\'- 



K 



*2(r,A,x,&,cr) := — 1 +pl — 1e^'^V°'{x) + ^logE exp 
— a — a 2 

p:= imin(l,e^(2r;aaTexp(CT))-i), 

C C(6, crF, a. A) ^ a{CvCbt' + PC^a^l + 2r/A)2(Cy + Cb) + arjCbA 
K K{y, b, A) = {CyCb)^ + vCbA 

Proof. For A £ [0, 1], one has 

E,[cxp(AFi-"(XiJ)] = 1 + XE^V'-'^iXtJ] + ^E^lV^'-^^^iXtJ] 

k>2 

< 1 + AE,[yi-"(X,J] + A^ lE,[y (X, J 

fc>0 

< exp (A(E,[Fi-"(XtJ] + E,[e^"°(^'")])) , 
Tedious but simple computations, in the spirit of Proposition 13. 11 show that 



K 



[l-a)KT 
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with K := K{V, b, A) = (CyCb)^ + rjCbA. 

Thanks to the following Young inequality, for all p> 0, for all x e R'^, V'^~°'{x) < ^pV^^^x) + ^^p' 
which is valid if a G 1], one has for p = p -.^ i niin(l, e^(277aCcrrexp(Cr))"^) 



sup 'Ea:[e^ < exp( p~^^) sup E:r 

0<ri<JV a 0<n<N 



,2a-l „i 



exp 

1 - a 



Lj^pv^ixt'^-) 



< exp( p exp p e^^V^ix) + -logE[cxp( 



a 



,e^(l_-a) 
2a 



\U\' 



where we used Proposition 13.11 for the last inequality. 

Now, for all A > 1, using the Young type inequalil 
valid for all p > (to be chosen later on) and for all a G (|, 1], one derives 



Now, for all A > 1, using the Young type inequality XV^-"{XtJ < (^^)p"^A^ + {^)pV°'{Xt._ 



E,[exp(AFi-"(XtJ)] < exp (^{^^-l)p-^ ) 
< exp [KX^^) 



exp 



1-a 



pV'iXtJ 



with K{p) := +\og{E^ [exp ((i^) pV"{Xt,J)]) and < niin(l, e0(2r;aaTcxp(CT))-i). We 

select p = (0 in the last inequality to complete the proof and use Proposition l3.1l to bound the quantity K{p). □ 

Corollary 3.2. Under the same assumptions as Proposition lS. 11 one has 

(A/A3.2)2 



VA 



e [0, A3.2), sup log (e, [exp(AVi/2(xf )) 

0<n<Ar ^ L 



< K- 



3.2 



1 - (A/A3.2) 



whereK3,2 := Xl2exp{CT){2V^^^{x)+2r]aC„'E[\Ui\^]T) anrfAa.a safe/^es E[exp(A§.22?7aC^rexp(Cr)|[/i|2)] < 
2. 

Proof. By definition of A3.2, we have Vfc > 1, Xl%{2T]aCaT cxp{CT))''E[\Ui\'^''] < 2kl. Consequently, setting 
temporarily Ci := exp(Cr)F^/^(a;), C2 := 2riaCaT exp{CT) for sake of simplicity, simple computations show 
that 



\2fc/^fcTpr|rr |2fcl 

logE[exp(A2C2|t/ip)]-A^C2E[|C/ip]=log|l + ^- ^^^l^il J 



< 



E 

k>2 



k>l 

A2*C2^E[|C/i|2^] 



fc! 



X'C2E\\Ui\ 



< 



2k 



2E 

fr2v^3.2 

hence, using Proposition 13. II for a — ^ and VA G [0, A3. 2), we clearly get 
sup log (e, [cxp(A2l/i/2(xf ))] )<XlJci + 

0<n<N ^ L -1^ V 



2i-(VA3^2). if A < A3.2, 
+00, otherwise. 



C2E[|[/i|2]\ (A/A3.2)^ 



1 - (A/A3.2) 



< 2A3 2 ( Ci 



C2E[|C/i|2]\ (A/A3.2) 



1-(A/A3.2)' 



This completes the proof. 



□ 
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Proposition 3.2. (Control of the Lipschitz modulus of iterative kernels) Denote the Lipschitz modulus of b 
and a appearing in the diffusion process [SDE^^ by [b]i and [cf\i, respectively. Denote by Pk and Pk,p = 
Pf. o ■ ■ ■ o k, p ^ {0, • • • ,N~ 1}, k < p the (Feller) transition kernel and the iterative kernels of the 

Markov chain defined by the scheme (|1.2p , respectively. Then for all real-valued Lipschitz function f and 
for all k, p & {0, • • • ,N— 1}, k < p the functions Pk{f) are Lipschitz- continuous and one has 

[Pk,p{.f)\i := sup ■ < + C(6,cr, A)A) = 

(x,x')G(R'')2 \X — X \ 

where [f]i stands for the Lipschitz modulus of the function f and C{b, a, A) = 2[b]i + [aj'f + A[fe]^. 

Proof. Using the Cauchy Schwarz inequality and (HS), for all {x, y) e (R'')^ and for all k e {0, • • • , — 1}, 
one has 

\Pkif){x) - Pk{f){y)\ < [/]iE [ fix + b{tk, x)A + a{tk, x)Ui) - f{y + 6(tfc, 2/)A + A'^ a{tk,y)Ui) 



x-y + {b{tk, x) - b{tk,y))A + A'^{(j{tk,x) - a{tk,y))Ui 



< [/]iE 

< [f]i{l + C{b,a,A)A)i\x-y\. 
A straightforward induction argument eompletes the proof. 



□ 



Proposition 3.3. (Control of the Laplace transform) Denote by the value at time T of the scheme (jl.2[) 
associated to the diffusion ^SDEjjg ] . Assume that the innovations (t/„)n>i in (|1.2p satisfy (GC(j3)) for some 
[3 > 0. Let f be a real-valued 1- Lipschitz- continuous function defined on IV^. For all X > and for all a G (5,1], 
one has 



[exp(A/(X^))] < exp(AE, [f{X^)] ) cxp (x3.i(<p(r, 6, a, A) V (^(r, b, a, A)^){X^ V A^)) 



with ^{T, b, a, A) := a/3li±g(^e3C(A)T ^(^) 2[6]i + [a]l + A[b]l 
If a = i, for all A E [0, (^(r, 6, cr, A)-i/2;s^3.2), one has 

E. [exp(A/(^-„] < exp(AE, [/(X-,],»p ( A „ jiMIAf^^^^ 

Proof. As mentionned earlier on in the introduction, we begin our proof using that the law fj. of the innovation 
satisfies (GC{(3) ) and (HDq). Hence, for A > and k £ {0, • • • , — 1}, one has 

Pfc(exp(A/))(a-) = E [exp (a/ (a- + b{tk,x)A + aitk,x)A^/^Uk+i)) 



< exp ( XPkif){x) + l3—[f]lA\aitk,x)\' 

< exp ( XP,if){x) + C.f3^[f]lAV'-'^ix) 



(3.1) 



Taking expectation from both sides of the last inequality and using the Holder inequality with conjugate 
exponents {p, q) (to be specified later on) leads to 



E.T 



exp(A/(X,^^J)l < [cxp(ApPfe(/)(X,^))]^ 



exp 



AX\f]lv'-'^{Xt\ 



(3.2) 
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Now, we apply the last inequality for / := Pk+i.N{.f) and obtain 



CXp(AP,+i,Ar(/)(X,^^J) < [cxp(ApP,,Ar(/)(X,^))] ^ E, 



cxp 



-AA^[Pfe+i,^(/)]fF^-"(X,t) 



Consequently, an elementary induction yields 

E, [exp(A/(X#))] = E, [cxp(APjv,w(/)(^t^))] 
<E, [exp(Ap^Po,Ar(/)(a:))]^ 



n 



fc=0 



< exp(AE, [f{X^)] ) cxp ( V sup log (e. 



,%^A2A<;p2"(l+C(A)A)"vi-°(X^^) 



where we used Proposition 13.21 for the last inequality. Observe now that since {p,q) are conjugate exponents, 
we have i 4 = 1(1 - < = 1, so that 

q pi^ q\ p^^ ''1—- — q p— 1 



E, [exp(A/(X^))] < exp(AE, [/(X^)])exp sup log 

\0<n<Af 



(e. 



^A"Aqp2"(l+C(A)A)"vi-°(Jff ) 



e ■» 



Setting p 1 + C(A)A, 9 = ^ = ^cca')a'^ "^"^S straightforward inequality (1 + C(A)A)3^ < 
exp(3C(A)T), we derive 

E, [exp(A/(X^))] < exp(AE, [/(X^)])exp ( sup log (e, 

\0<n<N ^ 



g 4C(A) '.'^tn-' 



We set (p(T,6,a, A) := C^/3il±ggl^e3C(^)'^. For a € (i, 1], Corollary O clearly implies 

E, [exp(A/(X^))] < exp(AE, [/(X^)])exp (K3.iiip{T,b,a, A) W ip{T,b,a, A)^){X^ V A^) 



and for a = i, according to Proposition 13.21 for A < f{T, b, a, A) "'^^^As 2, one has 



3.2. Proof of Theorem [211 



□ 



We will prove the result for the process X solution of {SDEb.a I- The proof for the continuous Euler scheme 
is similar. 

Lemma 3.1. Under the assumptions of Theorem \2.2l for all p > 1, one has 

E,[ sup \Xt\^P] < (1 + exp(26p2(l + (Ct V C^)T)). 

0<t<T 
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Proof. Let g : x i-^ \/l + |a;p satisfying for all x G R'', ^gix) = g ^{x)x, V^.g(a;) = g ^{x)Id — g ^{x)xx* 
and V : X g'^P{x). Wc apply Ito's formula to the process V{Xt) with W{x) = 2pg{x)^P~^V g{x) and 
V'^V{x) = 2pg{xfv-'^V'^ g{x) + 2p{2p - l)g{xf i'~'^V g{x)V g{x)* noticing that for all t e [0,r] 

yV{x).b{t,x) + h:r{a*^^Va){t,x) < 2pCtg{x)^''-\l + \x\) + ^C,{1 + \x\^)\\W^V{x)\\ 

< ApC,g{xfP + \C,{1 + \x\''){Apg{xfP-^ + 2p{2p ~ \)g{xf^-^) 

< Ap{Cb V C^)g{xfP + 2p{Cb V C„)g{xfP + p{2p - l)(Cb V C,)g{x)''^ 

< 8p^CbWC^)Vix) 

we clearly obtain, 

V{Xl-)<V{x)+8p'{CbWCa) V{Xl-^)ds+ {VV*a){Xl"^)dWs. (3.3) 

Jo Jo 

where we classically introduced the stopping time t,„ ;= inf > : \Xt — a;| > m} for m G N* and the notation 
X"^'" := (XtAT„)t>o- The stochastic integral A/™ '''"(Vy*(T)(XJ'")(iH/s defines a continuous martingale 

so that taking expectation in the previous inequality clearly yields 

E,[y(X,"")] < V{x) + 8p2(C, V a) / E,[y(XJ'")]ds. 

Jo 

Now, using Gronwall's lemma we derive 

\/m e N*, sup E^\V{X;"^)] < (1 + |a;|)2Pexp(8p2(Cfc V Ca)r) 

te[o.T] 

As r,„ — +CXJ a.s., as m — >■ +oo (since sup^gjo t] \-^s\ < +oo) using Fatou's lemma, we finally obtain for all 
p>l 

sup I^AViXt)] = sup E,[g{Xt)^P] < (1 + exp(8p2(Cb V C,)r). (3.4) 

0<t<T 0<t<T 

We then observe that Ito's formula also implies 

E,[ sup ViXj"^)] < V{x) + 8/(Cb V a) / E4 sup V{X:'-)]ds + E.mn*] (3.5) 

0<s<t Jo 0<u<s 

where (M™)* supo<s<( Af™. Combining Jensen's and Doob's inequalities, one clearly gets 

E,[(M;")*]2 < E,[((Mr)*)2] < 4E,[(Mr)2] < 16p2c, [\^[g{X:"^fP]ds 

Jo 

< 16/aT(l + |a;|)^Pexp(32p2(Cb V C„)T) 

where we used Vcc G R'^, {VV*af{x) < 4p'^C^g{x)^P^^{l + \x\^) = Ap^C^g{xfP and 1^ for the last inequality. 
Consequently, plugging the latter estimate into p.Sp . one has for all i G [0,r] 

E,[ sup < V{x) + 4p(C,r)^(l + |a;|)2Pexp(16p2(Cf, V ^T) + 8p2(Cb V C„) / E,[ sup V{Xl-^)]ds 

0<s<t Jo 0<M<s 

< (1 + \x\fP{l + 4p(aT)^ exp(16/(Cb V C,)T)) + 8p2(Cb V / E,[ sup V{Xlr)]ds 

Jo 0<u<s 
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SO that using Gronwall's lemma yields and passing to the limit m — >■ +00, for all p > 1 

E,[ sup \Xt\^P] < E,[ sup V{Xt)] < 2(1 + |x|)"Pexp(26p2(l + V C„)T)). 

0<t<T 0<s<T 

□ 

For all real- valued and 1-Lipschitz function / defined on C and for all p > 1, one has 

E,[|/(X) - E,[/(X)]pf] = E,[|/(X) - /(O) + /(O) - E,[/(X)]pP] < 9jPE^[\\X\\'£] 

< 22P+1(1 + |x|)2Pexp(26p2(l + (Cfc V C„)T)) (3.6) 

where we used Lemma [3.11 for the last inequality. Now, combining the Chebyshev and Rosenthal inequalities 
for independent zero-mean random variables (see e.g. |JSZ85| ). for all p > 1, there exists C2p > such that 

T>(^\\-Hxh F [ff.^m-:^^ ^ B,[{EtUnX'')-^A.fiXm E.[|/(X)-E.[/(X)]pP] 

< ,(2(l + N))-'-exp(28p^(l + (ava)r)) 2exp(-^(,)) 

with (^(p) := — «;(5, ct, T)p^ +plog( (2(i'+ja|))^ ) where we used for aU p > 1, < (2p)^P < exp(2p^), see e.g. 
p. 235-236 in |JSZ85| . and p.6p for the last inequality. Optimizing the latter inequality with respect to p with 
p > 1, i.e. selecting p = 2n(b,a,T) (2(i+H))^ )' '^^^ obtain 



+ \x\))- 

for r^Af > (2(1 + exp(2K(6, a, T)). Otherwise, using the Jensen and Rosenthal inequalities, one has for all 
P G [0, 1] 

M M 



E,[(^ f{X^) E,[/(X)])2f] < E.[(^ f{X^) E,[/(X)])2]P < (AfC2E,[|/(X) - ^,[f{X)f]Y 

k=l 

< MP (4(2(1 + \x\)f cxp(k(6, a, T)))" 



k=l k=l 



where we used (|3.6p for the last inequality. Now, noticing that we have 4e < exp(K(6, cr, T)), Chebyshev's 
inequality yields 

(^lE/(^') >r^< ^ < 2^ < 2exp(-^(p)) 

with (p{p) -plog(p)+plog(^), C := (2(1 + exp(2K(6, cr, T) - 1) and where we used that for all p > 0, 
CP < 2{Cp)P since the function p 1-^ 2pP is minimized for p = exp(— 1) and 2cxp(— 1/e)) > 1. Consequently, 
optimizing over p such that p < 1, i.e. selecting p = ^W^, one has 



/ M 
V k=l 



Ce 



>r\ < 2 cxp 



(2(l + |x|))2exp(2«;(6,a,T)) 
for r'^M < Ce (2(1 + cxp(2k(6, ct, T)). This completes the proof. 
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4. Stochastic Approximation Algorithm: Proof of the main Results 

Throughout this section we wih assume that (HL), (HLS)^ and (HUA) are in force. 

4.1. Proof of Theorem [23] 

The proof of Theorem 12.31 is divided into several propositions. 

Proposition 4.1. Denote by 9 := {On)o<n<N the scheme (|1.4p with step sequence 7 = (7n)o<ri<Af satisfying 
(jl.Sp . Assume that the innovations {Ui)i>i of (|1.4[) satisfy {GC{l3) ) for some /3 > 0. Then, there exists ep > 
which only depends on the law ^ such that for all X < min(l, £^(8r/aC^n2.iv)~"'^); one has 

N-l / N-1 \ 

sup log {Ee„ [exp(Ai"(0„))]) < {L'^{9o)+C ^ 7fc+i)n2,A.A+ - ^ jl^, log (E [exp (8r?aC^n2,^A|C/|2)]) 



0<n<Af 



, 2 

k=0 \ k=0 



with = n2,N{a) rifco (1 + C^VaCh + W^Dll+i) and C = ^r^aCmW]'']. 

Proof. The proof relies on similar arguments as those used in the proof of Proposition 13. II Using the concavity 
of a; I—)- x", a € (0,1], a Taylor expansion of order 2 of the function L, and finally (HLS)^, for all k G 
{0, - • • ,iV- 1}, we have 

L^{eu+i) - i"(0fe) < aL^-\ek) (VL(0fe).(0,+i - Ok) + ii\ek+i - ^^.p) , 

= -7fc+iaL"-i(0fe) lyL{ek),h{ek)) - -/k+iaL"-\ek) {VL{0k), Uk+i) - h{Bk)) 

+ ar^^l^,L"-\e,)\H{ekMk+i)\', 

+ 2ria^l^,L'-\ek)\h{eu)?. 

Let us note that (HLS)^. implies that V(6',m) e R'* x R"?, \H{9,u) - h{e)\^ = \H{e,u) - E[H{e,U)]\'^ < 
2C^Li-"(6')(E[|t/|2] + which leads to 

< -jk+iaL'^-'iOk) (VL(0,),H(0fe,C/,+i) - h{ek)) + 4mCl^l+in\U\'] + iiiaChl+,\Uk+i\' 

2 ra/ 



2r^aChlk+iL 



Using again (HLS)^, V6' G R'^ the functions g{9, .) : u H' '^^'^^^ are Lipscliitz and more precisely 

satisfy 

yeeR', sup \i^^4^-4^<c^LH9). 

(«,ti')e(R<!)2 \u - u \ 

Consequently, denoting C_ = 477aC^E[|C/p], from the Cauchy-Schwarz inequality and since the law of the 
innovation satisfies (GC{f3) \ for some /? > 0, there exists > such that for A < min(l, £/3(877q;C^7J)~^), one 
has 

E [exp(AL"(0fc+i)| J-fe] < exp(A(l + 27^aChll+i)L"{0k)) exp(C7^+iA)E [exp(-2aA7fc+ig(0fc, Uk+i))\ Tk]'' 

X E [exp(87?aAC^7fc+i|C^fc+in| -^fc] ^ 
< exp(A(l + i2rjaCh + ^C2a)7,Vi)L"(0fc)) exp(C7,%iA)E [exp(8r;aAC^7,%i|f/n] ^ 
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In the aim of simplifying notations, we define 112, „ := nfc=o(l + C^VCtCh + ^C'^Ci)^'ij^i) and temporarify set 
Lk '■= nr^lT'' fc € {0, • • • ,N}. Taking expectation in both sides of the previous inequality clearly implies 



Ee„ [cxp(ALfe+i)] < E0„ [exp(ALfc)] cxp C-^^A E 
and by a straightforward induction, for n G {0, • • • , N} we have 



cxp SrjaC^ 



Efl 



n— 1 2 



n-1 



fc=0 



[exp(AL„)] < exp(ALo) exp C ^ TT^^ 11 

V k=0 ^•'^+^ ' '— " 

which finally yields for A < min(l, £^(87;aC^7^)~^) 



E 



cxp(877aC2-^A|C/p 



n 



2.k+l 



Be, [exp(AL"(0„))] < exp(n2,„i"(0o)A) exp ( C V -^^^l^^X ] TT E 



2,fe+l 



Up to a modification of a constant, we can assume without loss of generality that supQ<„<^7n+i = 71 < 1 
so that using the Jensen's inequality, the latter bound clairly provides the following control of the quantity of 
interest for A < min(l, e/3(877Q;C^n2,iv)~^) 



sup logfEso 

0<ri<Af ^ 



N-1 



N-1 



,AL°(e„) 



,8j;QC^n2,jvA|;7|- 



fc=0 



fc=0 



□ 



Corollary 4.1. Assume that the assumptions of Proposition are satisfied. Then, for alia G (5,!]; one has 
VA > 0, sup log(Ee„ [exp(Aii-"(0„))]) < 1^4.1 (A V A^) 

a<n<N 

where K4 I := max(\I'i(7, a, ^^0l ^2(75 Q!j ^Oi -ff)) o.'i^d 

/ N-i \ N-l 



fc=0 



JV-l 



cxp L"(0o)+2«CE7l+i n2. 



k=0 



1-a 1 



/2a- 1 i-c 

p 

\ a 



N-l 



NP- 



fc=0 



2q: - 1 i-a. 



Af-l 



L%e^) + cY,il+i]'^2., 



k=0 



1-a 1 



e 2q 



-\u\' 



N-l 



NP- 



k=0 



a 



k=0 



p= imin(l,£^(877«C2n2,w)-') 

Proof. We only give a sketch of proof since it is rather similar to the one of Corollarv 13.11 For A G [0, 1], one 
has 

Eeo [cxp (AL^-" (0„))] < cxp (A(Ee„ [L^-^ (0„)] + ^e, [cxp(ii-" (0„))]) . 
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Tedious but simple computations in the spirit of Proposition 14. 1 1 easily show that 



N-l 



N-1 



sup E,„[Li-"(0„)] < sup Ee„[L"(0„)]'^ < i'-"(^o) + {8vaClE[\U\'] ^ 7^'+i)'^ H (l+2^(l-a)^^/.7.'+i)- 



0<n<Af 



0<n<Af 



Moreover, thanks to the Young type inequality L °'{9) < ^-^pL"{6) 



k=0 



l-g „TafQ\ 1 2a-l 



k=0 



for every (p, 9) e 



R!5_ X R'' and a £ (i, 1] and using Proposition l4.11 one obtains for p = p i min(l, e^(877Q;C^n2_Ar) ^) 



sup iL0nle 

0<ri<iV 



E«Je^"°(''")] < exp 
/ 1 



/2a- 1 i-c 

E 7fc+i ) log (e 
fc=o / ^ 



2 



exp i"(0o) + C5]7.'+i 



1-a 



NP- 



k=0 



SO that for all A G [0, 1] 

EeJexp(ALi-"(0„))] < *i(7,«,^o,i?)A. 
Now, for A > 1, we use the Young-type inequality XL^^"{6n) < ^"^"'" p^ ^"-^ A- 

EeJexp(ALi-"(0„))] < exp(A'A^) 



^-^pL°'{9n) to derive 



with iir(p) := 2<i^p-2Tj^ +logEeo [exp ((1^) pL"(0„))] and i^p < min(l, e^(87?aC2n2,jv)-^). We select 
p = p in the last inequality and use Proposition 14. 1 1 to bound the quantity K{'p). □ 

Proposition 4.2. (Control of the Lipschitz modulus of iterative kernels) Denote by Pk and Pk^p = Pk°' ' 'O-Pp-i- 
k, p € {0, • • • , N — 1}, k < p the (Feller) transition kernel and the iterative kernels of the Markov chain 6 defined 
by the scheme (|1.4p . Then for all Lipschitz function f and for all k, p £ {0, • • • , N — 1}, k < p the functions 
Pk{f) are Lipschitz- continuous and one has 



[Pk.pif)]i sup 

(6l,e')G(R'i) 



\PkAfm-PkAfW)\ 



< [/]in(l-2A7.+i+Cff,^7'+i)^ 



where [f]i stands for the Lipschitz modulus of the function f and Ch.i_i 2C|f (1 + E[|C/p]). 
Proof. Using the Cauchy-Schwarz inequality, (HUA) then (HL), for aU (0,9') e (R'^)^, one has 

\Pkifm - Pkif)iO')\ < E [\fi9 - ik+iH{9, Uk+i)) - f{9' - ik+iH{9', Uu+iM 

1 

< [/]iE [{9 ^9'- ik+i{H{9, Uu+i) - H{9\ Uk+i))f\ " 

< [/]i {{0 - e'f - 27fc+i {0 - e', h{9) - h{9')) + 7,%iE [\H{9, Uu+i) - H{9' , Uu+i)\^]Y 

< - 2A7;c+i + 2C|f(l + E[\Unhl+i)^\0 - e'\. 

A straightforward induction argument completes the proof. 

□ 

Proposition 4.3. (Control of the Laplace transform) Denote by 9n the value at step N of the stochastic 
approximation algorithm (jl.4l) with step sequence 7 := (7n)n>i satisfying (|1.5p . Assume that the innovations 
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{Un)n>i in (jl.4p satisfy {GC{l3) ) for some /3 > 0. Let f be a real-valued I -Lipschitz- continuous function defined 
on R'^. Then, for all A > 0, for all N > 1, for all a £ (i, 1], one has 

VA > 0, EeJexp(A/(0jv))] < exp (EeJA/(0Ar))]) exp (^ipo.{l,H,eo){ClX^ V Cj^"X^)) 
with the two concentration rates CJq := X^^o^ 7fc+i Hi" ' ^^^^ ^i,N '■= 11^0^(1 ^ 2A7fe+i + GH,tilk+i) '^"■'^ 



C^i^'" :=Efc=o (^rf )^((fc+l)log'(fc+4))^ for all N > 1 and where ip^{j, H^Oo) if4.i2^^V 
fM)2^ exn f ^ V'^-i 1 

V 4 ) \^2q-1 Z^A;=0 (fe+1) log2(fc+4) ^ ' 

If a = t/ien there exists two positive constants A4.1 and (pi/2{'y, H,9q) such that 



VA e [0, A4.i/,5Ar), Ee„[cxp(A/(0jv))] < exp (AEeJ/(^?jv)]) cxp (2^1/2(7, go)C;(r ^ ^m^^'/I' ^ 



TOi/i SAT := maxo<fc<Ar_i(fc + 1)^/2 log(fc + 4)7^+1 (^77) ' cxp(X;^=, ^ 



=0 {p+l)log^p+4)'- 

Proof. The proof relies on similar arguments as those used for the proof of Proposition 13.31 For A > and 
fc e {0, • • • , - 1}, one has 



P,(exp(A/))(0) < exp (^AP,(/) + ^/3^l^^[f]jclL''-(e) 



Taking expectation on both sides of the last inequality with 6 ~ 9k and applying the Holder inequality with 
conjugate exponents {pk,qk) (to be fixed later on), one obtains 



EeJexp(A/(0fc))] < Ee„ [exp (ApfePfc(/)(0fe))]5^ Ee„ 
and applying the last inequality to / :~ Pk+i,N{f) yields 



exp(g,^/37.^+J/]?C^Li-"(0,) 



Ee„[exp{XPk+i,N{f){Ok))] < Eg„ [exp{XpkPk,N{f){0k))]^ Ee„ 
We use Corollary 14. II to obtain for a G (^,1] 

1 / 

A2 



exp ( qk^l3ll+APk+i,N{f)]lClL''-''{ek) 



(4.1) 



Efl 



exp ( qk'-jf3lhi[Pk+i,Nif)]lClL'-''iek) 



'^<expU.i^v(^"^^ 



x{^l^,[Pk+,,N{f}]lx'yrkll'[Pk+i,N{f)yr'qr'>^^) 
■■= h{X) 



where we temporarily set /fc (A) exp [k^,^^ V (^) " (7I+1 [ft+i,Ar(/)]?A2 V 7,^;!^ [Pk+iMif )]^'-' ql^^' \ 
for all A > in the interests of simplifying notation and analysis. Now, an elementary induction argument leads 
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to 



Ee„[exp(A/((?Ar))] = I]e„[cM^PN,NfiON))] 



^-1 1 N-i / k \ ntipjv-, 

< E9jexp(A n PkPo.N{f){e^W"--o-^ W A J]pAr_, (4.2) 

fc=0 fe=0 V 1=1 / 



We select pu := I + 1^.(^+4) ^ ^fe = (1 + (lTT)kiW4) )(^ + 1) l^s'^^^ + 4) < 2(fc + 1) log'(fc + 4), fc = 
0, • • • , N- 1 so that Ha^Jo^ Pfe converges and more precisely we have nf=o^ Pk < cxp{J2k=o (fc+i) iog^(fc+4) ) < 
We introduce for sake of simplicity 1^0,(7, H, 9a) := Ki,i2'^ ^ V exp {^^^ Y.k=o (fc+i) iog^(fc+4) 



Now, using Proposition 112] and Corollary 14. 11 we easily derive from (|4.2 

VA > 0, EeJexp(A/(0jv))] < exp (E^^ [A/ (0^))]) exp (^^^{^, H,eo){ClX^ V C^'"A^) 



with cr := Ef=oSfc;r 1) W(fc+4))^. 

For a = we start from (|4.1D . First, we use the control obtained in Proposition 14. II to derive 



E«„ exp (^g,-/372+i[P,+i,Ar(/)]?C?/2L^(0fc)jJ < exp ( U5(0o) + C ^ 7^+1 J n2,jv (1/2) ^^7fc\i[^fc+i,w(/)]?A2 



pel, 



p=0 



X logE [exp (/37?C4/2n2,jv (1/2) gfe72+i[Pfc+i,jv(/)]?A2|C/p 



To simplify the latter bound, that is to obtain an explicit and computable formula for the second term appearing 
in the right hand side, we will need the following lemma: 

Lemma 4.1. For all A G [0, X^.i/ s]^'^), one has 

logE [exp (^Pi^Ct/,n2,N {l/2)qk^l+,[Pk+i.N{f)]lX'\Uf)] < fivCt/2^2,N {l/2)-E[\uWkll+APk+i,N{f)]lX^ 

1 - (-^Sat /A4.1) 



with 



SN ■■= maxo<fc<Ar_i Qkll+iTrf^ ""'^ ^^4.1 satisfies E[exp(A| ;^/377Cf/2n2,w (1/2) |[/p)] < 2. 
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Proof. The proof is similar to the proof of CoroUaryO By definition of A4.1, (/377Cf/2n2,w(l/2)/2)PE[|C/|2p] < 
2p\, Vp > 1. Hence, setting Ci := l3'qC^^^Il2,N (1/2) we easily deduce, 

logE [e^=c^i^'.7.%.[P.+,«(/)]?|r/P] _ A2Cig,7.%i[Pfc+i,iv(/)]?E[|C/n 



p>2 

p>2 



''^4.1 



< 



+00, otherwise. 



This completes the proof. 



□ 



1 /2 

Using the previous lemma, we obtain for all A £ [0, Xa.i/sj^ ), 



Eo 



exp ( qk^p^l^,[Pk+,M{mC^,/2LHdk) 



<exp(vE'(iV,7,0o)7'+i[ft+i.iv(/)]?A2 



(A/A4.i)2 



1-(a4/VA4.i), 



where we introduced the notation 1'(Ar,^,0o) (^^ (^0) + C^^Jo' 7p+i) n2,N (1/2) ^^+P'nC^/^Ii2.N (1/2)E[|[/|2 
Now, as for a € (^, 1], an induction argument in the spirit of (|4.2p yields for all A G [0, A4.i/sjv) 



E, 



1 



, p=0 



^2., (A/A4.i)2 



(AsAr/A4.i) / ' 

< exp(AEeJ/(0^)])cxp ( 2^,/2{l.HMCl ({X/Xi.i? V ^ ^';;r-y. , 

V V 1 ^ (AsAf/A4.ij 

= cxp (AEe„ [/(^at)]) cxp (^2^1/2(7, H, 0o)C 



(A/A4.i)^ 



with (^1/2(7, i/, 6*0) := exp(X;f=o^ (fc+i)iog^(fc+4) )(^l.i^(^'7,^o)+E^=oSp+i), Sjv := s]^^ eMT,k=o (fc+i) iog^(fc+4) )' 
and where we used again nf=o^Pfe < exp{J2k=o (fc+i) iog^(fc+4) )■ 



□ 



In contrast to Euler like schemes, a bias appears in the non-asymptotic deviation bound for the stochastic 
approximation algorithm. Consequently, it is crucial to have a control on it. At step n of the algorithm, it is 
given by 5„ := E[|6'„ — 6'*|]. Under the current assumptions (HL), (HLS)^, (HUA), wc have the following 
proposition. 
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Proposition 4.4 (Control of the bias). For all n > 1, we have 

Sn < exp (-Ari,„ + C„,^r2,„) 1^0 - 0* | + (2C„^^)' ( ^ 7^+1 cxp (-2A(ri,„ - T^.k+i) + 2Co.A^2^n - Ts.fe+i)) J 

\fe=0 / 

where T,,^ := ELi Ik, Ta,™ := ELi ^a,^ ■= + 2aXE[|C/|2] wtth K > 0. 

Proof. With the notations of Section [1.21 wc define for all n > 1, AM„ :— h{6n-i) — H{9n-i,Un) = 
E[H{9n-i, Un)\ J^n-i] — H{9n,Un)- Recalling that (C/„)„>i is a sequence of i.i.d. random variables we have that 
(AM„)„>i is a sequence of martingale increments w.r.t. the natural filtration := {Tn :— a{0o,Ui, ■ ■ ■ ,Un,)',ri> 

!)■ " ^ 

From the dynamic (|1.4p . we now write for all n > 0, 

z„+i := 0n+i-0* =9^-0* -jn+i{h{0n)- AMn+l} 

= 0n -0* - In+i [ d\Dh{e* + A(0„ - 0*))(0„ -e*)+ 7„+iAA/„+i, 
Jo 

where we used that h{0*) ~ for the last equality. Setting J„ := dXDh{6* + X{6n — 0*)), we obtain 
z„+i = (/ - 7n+i'-/>i)2;n + 7„+iAA/„+i which yields 

Eso[|z„+i|2] = E0j|/-7„+iJ„|2|^„|2] ^27„+iE0j(/-7„+iJ„)AM„+i] +7^^iEeJ|AM„+i|2] 
= EeJ|/- 7„+iJ„nz„|2] +^2^^EeJ|AM„+i|2]. 

From assumption (HLS)„, we deduce thatV(6l,u) e R'^xR'?, |/i(6l) -i?(6l, < 2C2Li-"(6')(E[|C/p] + lup) 
which combined with the independence of 0„ and clearly implies 

Eeo[\h{0n) - H{0n,Un+i)\^] < AClE[\U\^]^eo[L^-"{dn)]- 

Now, let us notice that L has sub-quadratic growth so that there exists a constant K > such that 

Es, [|AM„+i|2] =Eeo [|/i(0„) - C/„+i)P] < ACm\U\^]Ee, [L^-"{0n)] < iKClE[\UWl + Ee,[\z^\^]), 

which provides the following bound 

E.J|z„+iP] < (l-A7„+ifE,o[|z„P]+4ifC^E[|[/p]7^+iE,J|z„p] 
< (1 - 2A7„+i + 2C^^^-fl+,) EeJ|z„|2] + 2C^^^^l+,. 

Temporarily setting fin = YIpZq{1 — ^Xjp+i + 2Ca,fj,jp+i), ^ straightforward induction argument provides 

Tl-l 

Eeo [\zn\'] < fin\0o -0*\' + 2^,^ J2 7fc+in„n^^i 

fe=0 

n-1 

< g-2Ari,„+2C„„,r.,„ _o*\2 + 2C„,^ ^ ^2^^e-2A(ri.„-ri,.+i)+2C.,,(r..„-r,,,+0 
where we used the elementary inequality, 1 + a; < cxp(a;), x e R. This completes the proof. 
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4.2. Proof of Theorem [23] 

Proposition 4.5. (Control of the Lipschitz modulus of iterative kernels) Denote by K}^ and K^ p = K]^ o ■ ■ ■ o 
Kp_i, k, p € {0, • • • ,N— I}, k < p the (Feller) transition kernel and the iterative kernels of the Markov 
chain z = (^,0) defined by the scheme (|1.4p . (|1.8p . Let f : R'^ — > R &e a 1-Lipschitz function. Then for all 
k, p G {0, • • • ,A^— 1}, k < p the functions Kk,p{f) '■ z i— > E[/(^p-(-i) | = z] are Lipschitz- continuous. Ln 
particular, for all {z,z') g (R'' x R'')^, one has 



|X,,,(/)(z)-X.,,(/)(z')l<^l^i-4l + ^ E (w^) 

P + i p + i^.^^vni,W 



\Z2 - Z^l 



where IIi^p = 11^=0(1 ^ 2A7fc+i + Ch,p.%+i)- 

Proof. Let {z,z') G (R'' x R'')^. Wc denote by z^'^ = Op'^i and z^'^ = Qp'^ the values at step p of the two 
components of the stochastic approximation algorithm (2;„)„>o starting at point z at step fc. Using ()1.8p and a 
straightforward induction, one easily derives 



ak z fc 1 1 V -v /j. 

j=k+i 



so that taking conditional expectation in the previous equality and using Proposition 14. 2| we obtain 

n + 1 P + 1 



^fc + 1, ,,, 1 x^/^Hi 



|Z2 - Z2I 



□ 



Let fc e {0, • • • , iV — 1} and / be a real-valued 1-Lipschitz function defined on R**. Using that the law of the 
innovations of the scheme satisfies (GC{/3) ), for all A > 0, one has 



E [cxp{XKk,N-if{zk))\ Zk-i = z] = E 



cxp ( XR'k.N-ifi^^ek + -^^k.dk) 



<exp(AAVi,^_i(/)(z))exp(A2^[5]?) 



9k,0k-i) = (21,2:2) 



where g : u Kk,N-i{f) [j^zi + -i^Z2 - -^H{z2,u),Z2 - jkH{z2,u)j . Combining Proposition and 
(HLS)q, one easily obtains 



1 1 ^Wn 



[9h<C^L^(z2hk\j^ + j^ E U 



so we deduce that 

E[exp{XKk^N-if{zk))\zk^i] < exp(Aiffc_i,w„i(/)(zfc_i))exp ( ^ClL^-''{zk-i)fk.N 



,2/^^2 j-l-ct/, \~,2 
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where we introduced the notation jk,N ■= 77- (^1 + J2j=k+i (Hi j/Hi^fc) ^ j . Hence, taking expectation in the 
previous inequahty and using the Holder inequaUty with conjugate exponents [pk, qk), one clearly gets 



[exp(AXfc,Ar_i(/)(0fe))] < EeJcxp(Apfeii:fe_i,Ar_i(/)(zfe_i))]'''= Ee^ 



exp ( \^^ClqkL'~^{eu-i)^l^ 



Similarly to the proof of Proposition SSI we set pfe = 1 + (k+i)\o^Hk+i) ' * = 0- + (fc+i) iog''(fc+4) )(^ + 
1) log^(fc + 4) < 2(fc + 1) log^(fc + 4) and use CoroUaryOto obtain for a G {\, 1] 



Efl 



expl A2^C2g,Li-"(0fe_i)7iA. 



An elementary induction argument allows to conclude 

Eso [exp(A/(^jv))] = Ee„ [exp(Aifjv-i,Jv-i(/)(;2iv-i))] 

< exp(AEeJ/(^^])cxp {ip^{^,H,eo){Cl\^ V C^^'^A^) 

with C]^ := J2k=i Ik N' ^N°' J2k=i 7fclv ^ ((^ + 1) log^(^ + 4)) ^^^^ and where wc again introduced, for sake 
of clarity, the constant ipai^, H, 9q) A'4.i25^^^ V ^"=0' TFTiyisbiM^ . 

For a = i , similarly to the proof of Proposition 14.31 (actually use again Lemma I4.ip , we derive for all 
A e [0, A4.1/4/') 



Efl 



N-1 



< 



(A/A4.i)2 



exp «'(iV,7,0o)7fc.ivA' + iI+iH,n 1/2,, , 

\ p=o 1 - (Asw /A4.1), 



with SN := maxi<fc<Ar_i(fc + l)log2(fc + 4)7|,v, *(A^,7,^o) := (i^ (^?o) + C E^Jo' 7p+i) n2,iv (1/2) ^ + 



P'qC'li2^'i.N (l/2)E[|t/| ] and an elementary induction argument clearly yields 

VA e [0,A4.i/sjv), EeJexp(A/(0Ar))] < exp (AE(,„ [/(fl^)]) exp (^2(^1/2(7, -ff, ^o)^: 



(A/A4.1 



1 - (AsAr/A4.l) 



with SAT := s]y^exp(Ef^o^ (fc+i) ioV(fc+4) ) ipi/2il,H,9o) := cxp{J2k=o (fc+i) iog^(fc+4) )(Ali^(^^ 7, ^0) + 

Appendix A. Technical results 

A.l. Proof of Proposition 11.31 

Let Co- := ^exp{—\x\/a) be the density of the exponential distribution with variance 2ct^ on R. If is a 
probability measure on R*^, we define as the convolution of yu with ef that is 

y"{dx) := / n ^exp(-|xi - y,\/a)fi{dy). 

Lemma A.l. If fi is a probability measure on R*^ with finite first moment, then Wi{fi^ < V2da. 
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Proof. Let X and Y be independent random vectors with laws and e^** respectively. Then {X, X + Y) is a 
coupling of /X and /u"^, and 

Wl{^l,^I^) < E[\Y\] < E[|y|2]i/2 < y2da. 

□ 



We therefore have the bound 



(A.3) 



so what is left is to bound E[VFi(/^^, /i'^)] and to optimize in <t. 

The density of /^^ with respect to the Lebesgue measure is given by gi^a-,n{x) :- 
density of fi'^ is g2,a(a;) := E^(ef ''(a; - X)). 

By the Kantorovitch-Rubinstein duality formula, we have 



iEef''(a;-x,), and the 



W^i(^^,/i'^) = sup / f{x)gi^a,n{.x)dx - / f{x)g2,cr{x)dx < / \x\\gi^a^n{x) - g2,a{x)\dx 

/:[/]l<l 



To bound this quantity, we shall use the following Carlson-type inequality: for any nonnegative measurable 
function / on R'', we have 



f{x)dx < CdJ / (1 + \x\d+^)f{x)^dx, Cd 



dx. 



This can be proved by using Jensen's inequality with the finite measure Y^^^^^a+rdx. Using this inequality, we 
get the bound 



Wl{fll,fl'')<CdJ / {l + \x\d+^)\x\^\gi,a,n{^)~g2AxWdx<CdJ / (l+2|x|'^+3)|5i,,,„(x)-52,.(x)|2dx 



Therefore, 



/y (1 + 2|.T|'i+3)|i J2 ^-""(^ - ^'') - M^-^i^ - XWdx 



j {1 + 2\x\d+^)\&T^{ef\x - X))dx 



< 



Cd 



{l + 2\x\'^+3)E[ef'^{x-X)^]dx. 
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Note that ef^ix)^ = 2-^'^a-'^eff^{x), so that we get 



< 



< 



< 



< 



< 





fn 
















fn 


















Cd 






/n 




{l + 2\u + y\d+^)e%{u)du^l{dy) 



^l + 2'^+3 j \y\d+^^i{dy) + 2d+^ j \u\<i+^e®'l^{u)du 



1 + 2-^+3 / \y\'i+^n{dy) + 2'i+^a'^+^d{d + 3)\ 



In the end, assuming < 1, we obtain 

Cd 



E[Wli^i, ^i")] < V8da + 



l + 2'*+3 / \y\'^+^n{dy) + 2'^+^a'^+^d{d + 3)\<C{d,n){a + 



-d/2 



2dcjd/2^ _ 

Taking a = n^^^^'^^^\ we get the upper bound we were aiming for. 
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